83 research outputs found
Transfer from Multiple MDPs
Transfer reinforcement learning (RL) methods leverage on the experience
collected on a set of source tasks to speed-up RL algorithms. A simple and
effective approach is to transfer samples from source tasks and include them
into the training set used to solve a given target task. In this paper, we
investigate the theoretical properties of this transfer method and we introduce
novel algorithms adapting the transfer process on the basis of the similarity
between source and target tasks. Finally, we report illustrative experimental
results in a continuous chain problem.Comment: 201
A Novel Confidence-Based Algorithm for Structured Bandits
We study finite-armed stochastic bandits where the rewards of each arm might
be correlated to those of other arms. We introduce a novel phased algorithm
that exploits the given structure to build confidence sets over the parameters
of the true bandit problem and rapidly discard all sub-optimal arms. In
particular, unlike standard bandit algorithms with no structure, we show that
the number of times a suboptimal arm is selected may actually be reduced thanks
to the information collected by pulling other arms. Furthermore, we show that,
in some structures, the regret of an anytime extension of our algorithm is
uniformly bounded over time. For these constant-regret structures, we also
derive a matching lower bound. Finally, we demonstrate numerically that our
approach better exploits certain structures than existing methods.Comment: AISTATS 202
Estimating the maximum expected value in continuous reinforcement learning problems
This paper is about the estimation of the maximum expected value of an infinite set of random variables. This estimation problem is relevant in many fields, like the Reinforcement Learning (RL) one. In RL it is well known that, in some stochastic environments, a bias in the estimation error can increase step-by-step the approximation error leading to large overestimates of the true action values. Recently, some approaches have been proposed to reduce such bias in order to get better action-value estimates, but are limited to finite problems. In this paper, we leverage on the recently proposed weighted estimator and on Gaussian process regression to derive a new method that is able to natively handle infinitely many random variables. We show how these techniques can be used to face both continuous state and continuous actions RL problems. To evaluate the effectiveness of the proposed approach we perform empirical comparisons with related approaches
Learning in Non-Cooperative Configurable Markov Decision Processes
The Configurable Markov Decision Process framework includes two entities: a Reinforcement Learning agent and a configurator that can modify some environmental parameters to improve the agent's performance. This presupposes that the two actors have the same reward functions. What if the configurator does not have the same intentions as the agent? This paper introduces the Non-Cooperative Configurable Markov Decision Process, a setting that allows having two (possibly different) reward functions for the configurator and the agent. Then, we consider an online learning problem, where the configurator has to find the best among a finite set of possible configurations. We propose two learning algorithms to minimize the configurator's expected regret, which exploits the problem's structure, depending on the agent's feedback. While a naive application of the UCB algorithm yields a regret that grows indefinitely over time, we show that our approach suffers only bounded regret. Furthermore, we empirically show the performance of our algorithm in simulated domains
Best Arm Identification for Stochastic Rising Bandits
Stochastic Rising Bandits is a setting in which the values of the expected
rewards of the available options increase every time they are selected. This
framework models a wide range of scenarios in which the available options are
learning entities whose performance improves over time. In this paper, we focus
on the Best Arm Identification (BAI) problem for the stochastic rested rising
bandits. In this scenario, we are asked, given a fixed budget of rounds, to
provide a recommendation about the best option at the end of the selection
process. We propose two algorithms to tackle the above-mentioned setting,
namely R-UCBE, which resorts to a UCB-like approach, and R-SR, which employs a
successive reject procedure. We show that they provide guarantees on the
probability of properly identifying the optimal option at the end of the
learning process. Finally, we numerically validate the proposed algorithms in
synthetic and realistic environments and compare them with the currently
available BAI strategies
A Nanocryotron Ripple Counter Integrated with a Superconducting Nanowire Single-Photon Detector for Megapixel Arrays
Decreasing the number of cables that bring heat into the cryocooler is a
critical issue for all cryoelectronic devices. Especially, arrays of
superconducting nanowire single-photon detectors (SNSPDs) could require more
than readout lines. Performing signal processing operations at low
temperatures could be a solution. Nanocryotrons, superconducting nanowire
three-terminal devices, are good candidates for integrating sensing and
electronics on the same technological platform as SNSPDs in photon-counting
applications. In this work, we demonstrated that it is possible to read out,
process, encode, and store the output of SNSPDs using exclusively
superconducting nanowires. In particular, we present the design and development
of a nanocryotron ripple counter that detects input voltage spikes and converts
the number of pulses to an -digit value. The counting base can be tuned from
2 to higher values, enabling higher maximum counts without enlarging the
circuit. As a proof-of-principle, we first experimentally demonstrated the
building block of the counter, an integer- frequency divider with
ranging from 2 to 5. Then, we demonstrated photon-counting operations at
405\,nm and 1550\,nm by coupling an SNSPD with a 2-digit nanocryotron counter
partially integrated on-chip. The 2-digit counter operated in either base 2 or
base 3 with a bit error rate lower than and a maximum count
rate of s. We simulated circuit architectures for
integrated readout of the counter state, and we evaluated the capabilities of
reading out an SNSPD megapixel array that would collect up to counts
per second. The results of this work, combined with our recent publications on
a nanocryotron shift register and logic gates, pave the way for the development
of nanocryotron processors, from which multiple superconducting platforms may
benefit
- …